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Abstract 

The Asian longhorned beetle, Anoplophora glabripennis (Motschulsky) (Coleoptera: Cerambycidae), is one of the most 
economically and ecologically devastating forest insects to invade North America in recent years. Despite its substantial 
impact, limited effort has been expended to define the genetic and molecular make-up of this species. Considering the 
significant role played by late-stadia larvae in host tree decimation, a small-scale EST sequencing project was done using 
a cDNA library constructed from S^-instar A, glabripennis. The resultant dataset consisted of 599 high quality ESTs that, 
upon assembly, yielded 381 potentially unique transcripts. Each of these transcripts was catalogued as to putative 
molecular function, biological process, and associated cellular component according to the Gene Ontology classification 
system. Using this annotated dataset, a subset of assembled sequences was identified that are putatively associated with 
A. glabripennis development and metamorphosis. This work will contribute to understanding of the diverse molecular 
mechanisms that underlie coleopteran morphogenesis and enable the future development of novel control strategies for 
management of this insect pest. 
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Introduction 

The Asian longhorned beetle, Anoplophora glabripennis 
(Motschulsky) (Coleoptera: Cerambycidae), is a pest nat- 
ive to eastern China and Korea (Lingafelter and Hoe- 
beke 2002). In 1996, this insect was introduced into the 
United States, presumably via wood packing materials 
used to import cargo from Asia (Smith 2003). Since its 
initial discovery in the state of New York, infestations 
have been detected in Illinois (Poland et al. 1998), New 
Jersey (Haack 2003), and the province of Ontario, 
Canada (CFIA 2005; Haack 2006). A. glabripennis grow 
and reproduce on an array of hardwoods including 
members of the genera Acer (maple), Aesculus 
(horsechestnut), Betula (birch), Celtis (hackberry), Plantanus 
(plant tree, sycamore), Populus (poplar), Salix (willow), and 
Ulmus (elm) (Sawyer 2003; Ric et al. 2006). Late-instar 
grubs are especially destructive, forging winding galleries 
into the heartwood of the tree. This feeding behavior 
causes branch dieback and, in cases of heavy or persistent 
infestations, can result in structural deterioration and of- 
ten tree mortality (Haack et al. 1997). 

Burgeoning globalized trade presents a serious challenge 
in that A. glabripennis now have the opportunity to infilt- 
rate via multiple points of entry, mitigating the efficacy of 
the small number of quarantine facilities currendy in 
place. Undetected, this devastating pest could dissemin- 
ate throughout regions of North America where suitable 
host trees exist. Nowak et al. (2001) estimated that, if this 
occurs, up to 1.2 billion urban shade trees with a com- 
pensatory value of $669 billion could be lost. While sub- 
stantial, these figures do not factor in collateral losses 
such as degraded aesthetics and lowered property values 
nor do they take into account the potential impact to 
both commercial and natural forest stands. 

At present, eradication efforts center on the identification 
and removal of trees showing signs of A glabripennis infest- 
ation. As of 2002, SI 10.9 million was expended by feder- 
al, state, and city governments in New York and Illinois 
as part of this program (Stewart 2002). To gauge the util- 
ity of systemic insecticides as a supplement to this effort, 
scientists from the USDA Forest Service performed field 
evaluations in which trees were treated with either im- 
idacloprid or thiocloprid. While successful in reducing A. 
glabripennis populations, neither compound provided com- 
plete control (Poland et al. 2006). 

Only recendy has research begun to shift focus from 
chemical-based control strategies to the development of 
sustainable biocontrol alternatives including entomopath- 
ogenic fungi, rhabditoid nematode species, microspor- 
idia, natural predators and parasitoids (Smith et al. 2002, 
and references therein; Hajek et al. 2006) as well as artifi- 
cial lures and bait/trap tree systems (Li et al. 1999; 
Zhang et al. 2002). Furthermore, nominal effort has gone 
into the investigation of genome-based approaches for 
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management of A. glabripennis. To facilitate this work, our 
laboratory conducted a small-scale EST sequencing pro- 
ject and posted preliminary data to the National Center 
for Biotechnology Information (NCBI) dbEST where it is 
freely accessible to the scientific community. Because of 
the significant role played by late-stadia larvae in host 
tree decimation, 5 th -instar A. glabripennis were selected as 
a base for the transcriptome survey described herein. 

Materials and Methods 

Insects 

Fifth-instar A. glabripennis were obtained from a colony 
managed by Michael Smith at the USDA ARS Beneficial 
Insects Introduction Research Unit (Newark, DE). In- 
sects were maintained as previously described by Dubois 
et al. (2002). Larvae were ground direcdy in guanidine- 
isothiocyanate buffer (1 larva per 20 ml buffer) and 
stored at -40°C prior to shipment. 

RNA extraction and library construction 

Upon arrival at the USDA ARS U.S. Horticultural Re- 
search Laboratory (Ft. Pierce, FL), the majority of 
samples were transferred to an ultra-low temperature 
freezer (— 80°C) for archival purposes and a single larva 
was subjected to further processing. Buffer RLT (Qiagen, 
www.qiagen.com) was added to the primary sample at 
2.5X the original volume along with 150 ul P-mercapto- 
ethanol. The sample was placed at — 40°C for 10 min 
then incubated at 37°C for 20 min. Intact tissues were 
further homogenized with a QIAShredder and total 
RNA extracted using an RNeasy Maxi Kit (Qiagen) ac- 
cording to the manufacturer's instructions. The eluate 
was precipitated in 0.1 volumes 3M sodium acetate and 
2.5 volumes absolute ethanol at — 40°C overnight and the 
resultant pellet resuspended in 35 ul RNase-free water. 
Poly (A) + RNA was purified using the MicroPoly(A)Pure 
Kit (Ambion, www.ambion.com). A primary library was 
constructed with Stratagene's ZAP-cDNA® Library 
Construction Kit (Stratagene, www.stratagene.com) and 
subsequendy mass excised using ExAssist® Helper Phage 
(Stratagene). The library had a titer of 9.75 x 10 5 colony 
forming units per ml with inserts averaging ~ 1,221 bp. 
Transformants were recovered by random colony selec- 
tion and grown overnight at 32°C, 125 rpm in LB Broth 
supplemented with 1 00 mg/ ml ampicillin. 

EST sequencing 

Plasmid DNA was extracted using the Qiagen Liquid 
Handling Robot (Model 9600) in conjunction with the 
QIAprep 96 Turbo Miniprep Kit according to the re- 
commended protocol. Single-pass sequencing was per- 
formed usine the ABI PRISM BigDye™ Primer Cycle 
Sequencing Kit (Applied Biosystems, 

www.appliedbiosystems.com) and a universal T3 primer. 
Reaction products were precipitated, resuspended in 15 
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ul sterile water, and loaded onto an ABI 3730 DNA Ana- 
lyzer (Applied Biosystems). 

Sequence analysis 

Base calling was performed by TraceTuner™ (Paracel, 
www.paracel.com) and low-quality bases (quality score 
<20) were stripped from both ends of each EST. Quality 
trimming, vector trimming, and sequence fragment 
alignments were executed using Sequencher™ software 
(Gene Codes, www.genecodes.com). Sequencher contig 
assembly parameters were set using a minimum overlap 
of 50 bp and 90% identity. Contigs joined by vector se- 
quence were flagged for possible miss-assembly and 
manually edited. The EST sequences reported in this 
study have been deposited in GenBank's dbEST under 
accession numbers DR108748-DR1 09303. 

Sequence annotation and Gene Ontology 
classification 

Putative sequence identity was determined based on 
BLAST similarity searches using the NCBI BLAST serv- 
er (www.ncbi.nlm.nih.gov) with comparisons made to 
both non-redundant nucleic acid and protein databases 
using BLASTN and BLASTX, respectively. Matches 
with an E-value <-10 were considered significant and 
were classified according to the Gene Ontology classifica- 
tion system. In the case of CG numbers (e.g., 
CG30437-PA), annotations were conferred using the as- 
sociated CV term provided by FlyBase 
(www.flybase.org). All other sequences were associated 
with a molecular function, biological process, and cellular 
component based on searches to the Gene Ontology 
database (www.geneontology.org). Custom Perl scripts 
and Excel spreadsheets were used for BLAST parsing 
and table generation. The SignalP 3.0 Server was used to 
predict the presence and location of signal peptide cleav- 
age sites (www.cbs.dtu.dk/services/SignalP/). 



Results and Discussion 

General overview 

A single A. glabripennis larva was used for this study so that 
allelic variations within an individual (EST allele counts) 
could be distinguished from those that may exist across a 
population (population allele frequency). 5'-end one-pass 
sequencing of the cDNA library yielded 672 ESTs, of 
which 599 were designated as high quality (i.e., >200 
bases with a TraceTuner™ score of 20 or better). ESTs 
ranged in size from 206 to 828 bases with an average 
length of 650 bases. Upon assembly, these sequences 
were condensed to form 47 contiguous sequences 
(contigs), leaving 334 as singletons. Contigs and singlets 
together culminated in 38 1 unique sequences that putat- 
ively represent distinct transcripts. Contigs ranged in size 
from 392 to 2,240 bases with an average length of 954 
bases; whereas, singletons varied from 206 to 828 bases 
with an average length of 647 bases. 

Highly redundant transcripts 

The calculated redundancy of the library was ~32% with 
nine contigs found to be highly redundant (i.e., contain- 
ing >5 ESTs; Table 1) accounting for 24% of the total 
ESTs. Two of the contigs, representing 20 ESTs, had sig- 
nificant sequence similarity to mitochondrial genes and 
were subsequently discarded from the transcriptome sur- 
vey. Nearly half of the highly redundant contigs had no 
significant similarity (E >— 10) to any sequence listed 
within NCBI's nr database. These transcripts correspond 
to potentially novel genes specific to A. glabripennis and 
warrant further examination. The remaining three con- 
tiguous sequences returned significant matches to pro- 
teins previously identified in other coleopteran species. 
The most frequently represented of these transcripts, 
WHALB[0244], constituted 22 ESTs and matched most 
closely to a 56 kDa early-staged encapsulation-relating 
protein previously identified from Tenebrio molitor larvae. 
Upon assessment of alignment integrity, two sequence 
variants differing by 14 single nucleotide polymorphisms 
(SNPs) were resolved. Consequently, WHALB[0244] was 



Table I. Most abundantly represented transcripts in the A. glabripennis cDNA library 



Contig 


ESTs 


Accession No. 


Gen Bank Descriptor [Source Organism] 


E-value 


[0243] 


51 


CAM363I 1 


hypothetical protein [Thermobia domestica] 


4.00E-09 


[0244] 


22 


BAA78480 


56 kDa early-staged encapsulation-inducing protein [Tenebrio molitor] 


5.00E-25 


[0259] 


19 


AAM44045 


arylphorin-like hexamerin [Apriona germarf] 


0 


[0241] 


15 


YP_6595I3 


cytochrome c oxidase subunit 1 [Anoplophora glabripennis] 


0 


[0255] 


12 


XP_38I775 


hypothetical protein FGO 1599.1 [Gibberella zeae PH-I] 


6.00E-05 


[0250] 


9 


BAA78480 


56 kDa early-staged encapsulation-inducing protein [Tenebrio molitor] 


4.00E-05 


[0275] 


6 


BAA78480 


56 kDa early-staged encapsulation-inducing protein [Tenebrio molitor] 


0.002 


[0270] 


5 


YP_6595I7 


cytochrome c oxidase subunit III [Anoplophora glabripennis] 


5.00E-89 


[0278] 


5 


XP_973799 


PREDICTED: similar to CG6806-PA [Tribolium castaneum] 


E-l 16 



NOTE: items shaded in grey were treated as contaminating sequences and were removed prior to further annotation. 
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dissolved and realigned to form two assembled se- 
quences, WHALB [0244a] containing 10 ESTs and 
WHALB [0244b] containing 12 ESTs. Both variants pos- 
sessed a single open reading frame (ORF) consisting of 
456 amino acid residues, the first 15 of which are thought 
to encode a leader/ signal peptide. Amino acid abund- 
ance analyses of the translated protein sequences re- 
vealed a preponderance of Gin (99 residues or 22%), Gly 
(46—47 residues or 10%), and Leu (45 residues or 10%) 
within the coding domain while Cys and His levels were 
negligible (<0.5%). Although comparable to the cDNA 
of T. molitor 56-kDa encapsulation-relating protein with 
respect to amino acid abundance and overall sequence 
similarity, significant distinctions were noted including an 
eight amino acid insertion shortly after the signal peptide 
and 1 1 deletions scattered along the length of the coding 
domain (Cho et al. 1999). This would seem to indicate 
that WHALB [0244a] and WHALB [0244b] represent 
novel proteins which may play a role in A. glabripennis cel- 
lular defense. As such, both coding domains have been 
deposited into GenBank under accession numbers 
EF583868 and EF583869. The second most highly re- 
dundant contig, WHALB [0259], contained 19 ESTs and 
appeared to span the coding region for an arylphorin-like 
hexameric storage protein, denoted AglHEX (accession 
number EF583870). AglHEX had an ORF of 2,151 nuc- 
leotides, encoding a protein precursor 717 amino acids in 
length. The N-terminal of this precursor most likely con- 
tains a cleavage site between AYS 17 /A, 8 V, indicating a 
signal peptide for transmembrane transport. In addition, 
the following highly conserved larval storage protein 
(LSP) signature sequence patterns were noted: LSP signa- 
ture- 1 motif Y(F/Y/W)XED(L/I/V/M)X 2 NX 6 HX 3 P) 
and LSP signature-2 motif TX 2 RDPX(F/Y)(F/Y/W) 
with the corresponding sequences in AglHEX as 
YYLEDVGLNAFYYYYHLYYP 2 i8 237 and 
TSMRDPVF 421 128 (Zhu et al. 2002). Contig 
WHALB[0278], comprised of five ESTs, showed greatest 
sequence similarity to a predicted protein from Tribolium 
castaneum annotated as similar to Drosophila melanogaster 
CG6806-PA. When queried to FlyBase, it was determ- 
ined that this transcript also corresponded to a LSP 
[partial LSP-2; ~400 amino acids missing from the pro- 
tein's N-terminal]. Of 307 in-frame residues, 
WHALB [02 7 8] contained five Met (2%) and 48 aromat- 
ic amino acids (16%), a composition indicative of 
arylphorin-like storage proteins (Telfer and Kungel 
1991). 

Functional classification of 5 -instar A. glab- 
ripennis ESTs 

A BLASTN search of the entire dataset revealed six con- 
tigs and five singletons with significant sequence similar- 
ity either to non-nuclear transcripts (e.g., rRNA genes 
and mitochondrial genes) or contaminating organismal 
transcripts (e.g., transcripts of plant, bacterial, or tremat- 
ode origin). These assembled sequences, representing 34 



ESTs, were removed from the dataset prior to further 
analysis. 

A total of 258 sequences (29 contigs and 229 singletons; 
58% ESTs) showed significant sequence similarity to 
known proteins. Four sequences (3 contigs and 1 
singleton; 4% ESTs) had hits with E-values >10~ 150 , 35 
sequences (9 contigs and 26 singletons; 8% ESTs) had 
hits with E-values between 10 and 10 ,114 se- 
quences (11 contigs and 103 singletons; 22% ESTs) had 
hits with E-values between 10~°° and 10 _ ", 48 se- 
quences (3 contigs and 45 singletons; 9% ESTs) had hits 
with E-values between 10 ' and 10 , and 56 se- 
quences (3 contigs and 53 singletons; 14% ESTs) had hits 
with E-values between 10 ^ and 10 ^ . The remainder 
of the sequences (9 contigs and 85 singletons; 42% ESTs) 
failed to return meaningful matches (E >— 10). The best 
match (i.e., hit with the lowest E-value; E <— 10) most of- 
ten corresponded to sequences derived from the Insecta 
with 225 ESTs (69%) showing greatest similarity to T. 
castaneum, followed by 8 ESTs (2%) for Drosophila spp., 
and 6 ESTs (2%) for Apis mellifera. Of the remaining 
ESTs, 63 (19%) showed greatest similarity to coleopteran 
species other than T. castaneum, 1 2 (4%) to non-coleopter- 
an insect species, and 14 (4%) most closely resembled se- 
quences derived from non-insect source material. 
Sequences with a significant hit were further character- 
ized using controlled vocabularies using Gene Ontology. 
Overviews which include hierarchical listings of associ- 
ated molecular functions, biological processes, and cellu- 
lar components are provided in Tables 2, 3, and 4, 
respectively. 

Transcripts putatively associated with A. 
glabripennis development and 

metamorphosis 

Table 5 highlights a subset of developmental and 
metamorphosis-related transcripts identified in the A. 
glabripennis library. A brief discussion illustrating the 
role(s) of several of these transcripts is offered below 
along with select references. 

Autophagic cell death 

WHALB004-85 and WHALB007-57 encompassed the 
complete coding domains of a putative peptidyl-prolyl 
cis-trans isomerase (PPIase) and eukaryotic translation 
initiation factor. Using serial analysis of gene expression 
(SAGE), Gorski et al. (2003) substantiated the involve- 
ment of equivalent proteins (e.g., Dmel\cypl and 
Dmel\eIF-5A) in authophagic cell death. While generally 
considered as a defense mechanism, this process is be- 
lieved to be imperative for organelle turnover and recyc- 
ling during the transition from late instar/pre-pupa to 
pupa in holometabolous insects such as A. glabripennis. 
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Table 2. Molecular function 



Gene Ontology (GO) Term 3 


Number of 
ESTs b 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[p] Antioxidant activity 


2 


0.36% 


0 


2 


[p] Binding 


[c] Amine binding 


1 


0.18% 


0 


1 


[c] Carbohydrate binding 


5 


0.89% 


0 


5 


[c] Cofactor binding 


2 


0.36% 


0 


2 


[c] Hormone binding 


2 


0.36% 


1 


0 


[c] Ion binding 


22 


3.91% 


2 


17 


[c] Isoprenoid binding 


3 


0.53% 


0 


3 


[c] Lipid binding 


5 


0.89% 


0 


5 


[c] Nucleic acid binding 


29 


5.16% 


3 


22 


[c] Nucleotide binding 


26 


4.63% 


4 


18 


[c] Odorant binding 


1 


0.18% 


0 


1 


[c] Oxygen binding 


2 


0.36% 


1 


0 


[c] Protein binding 


57 


10.14% 


7 


42 


[c] Tetrapyrrole binding 


5 


0.89% 


0 


5 


[c] Vitamin binding 


4 


0.71% 


0 


4 


[c] No further information provided 


2 


0.36% 


1 


0 


[p] Catalytic activity 


[c] Deaminase activity 


3 


0.53% 


1 


1 


[c] Helicase activity 


1 


0.18% 


0 


1 


[c] Hydrolase activity 


[i] Hydrolase activity, acting on acid 
anhydrides 


18 


3.20% 


4 


10 


[i] Hydrolase activity, acting on 
carbon-nitrogen (but not peptide) bonds 


3 


0.53% 


0 


3 


[i] Hydrolase activity, acting on ester bonds 


9 


1.60% 


1 


7 


[i] Hydrolase activity, acting on ether bonds 


1 


0.18% 


0 


1 


[i] Hydrolase activity, acting on glycosyl 
bonds 


2 


0.36% 


0 


2 


[i] Peptidase activity 


39 


6.94% 


3 


13 


[c] Isomerase activity 


7 


1.25% 


1 


5 


[c] Ligase activity 


9 


1.60% 


0 


9 


[c] Lyase activity 


7 


1.25% 


0 


7 


[c] Oxidoreductase activity 


22 


3.91% 


2 


18 


[c] Small protein conjugating enzyme activity 


2 


0.36% 


0 


2 


[c] Transferase activity 


19 


3.38% 


1 


17 


[p] Chaperone regulator activity 


1 


0.18% 


0 


1 


[p] Enzyme regulator activity 


[c] Enzyme activator activity 


2 


0.36% 


0 


2 


[c] Enzyme inhibitor activity 


12 


2.14% 


3 


6 


[c] GTPase regulator activity 


3 


0.53% 


0 


3 


[c] Kinase regulator activity 


3 


0.53% 


0 


3 


[c] Phosphatase regulator activity 


1 


0.18% 


0 


1 


[p] Motor activity 


1 


0.18% 


0 


1 


[p] Nutrient reservoir activity 


24 


4.27% 


2 


0 


[p] Signal transducer activity 


7 


1.25% 


0 


7 


[p] Structural molecule activity 
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Table 2 (cont.) 



1 , — , f-\ , — , Onf aIaitu it _ Tai>m^ 
Vj tr 1 1 tr WfllOIOgy ^UU^ 1 cri 11 


Number of 
ESTs b 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[c] Structural constituent of cuticle 


1 


0.18% 


0 


1 


[c] Structural constituent of cytoskeleton 


10 


1.78% 


1 


8 


[c] Structural constituent of muscle 


1 


0.18% 


0 


1 


[c] Structural constituent of peritrophic 
membrane (sensu Insecta) 


1 


0.18% 


0 


1 


[c] Structural constituent of ribosome 


26 


4.63% 


4 


17 


[c] No further information provided 


6 


1.07% 


1 


4 


[p] Transcription regulator activity 


7 


1.25% 


0 


7 


[p] Translation regulator activity 


7 


1.25% 


2 


3 


[p] Transporter activity 


[c] Amine transporter activity 


1 


0.18% 


0 


1 


[c] Auxiliary transport protein activity 


1 


0.18% 


0 


1 


[c] Carbohydrate transporter activity 


1 


0.18% 


0 


1 


[c] Carrier activity 


9 


1.60% 


1 


7 


[c] Intracellular transporter activity 


1 


0.18% 


0 


1 


[c] Ion transporter activity 


2 


0.36% 


0 


2 


[c] Lipid transporter activity 


2 


0.36% 


0 


2 


[c] Neurotransmitter transporter activity 


1 


0.18% 


0 


1 


[c] Organic acid transporter activity 


1 


0.18% 


0 


1 


[c] Oxygen transporter activity 


25 


4.45% 


2 


1 


[c] Water transporter activity 


1 


0.18% 


0 


1 


[c] No further information provided 


5 


0.89% 


0 


5 


[p] Molecular function unknown 


89 


15.84% 


4 


61 



Classification is hierarchial: indented terms are children [c] of parent terms [p] listed above. All functional assignments of 
5th-instar A. glabripennis ESTs described here are the "inferred from electronic annotation" (IEA) using the top 5 BLASTX hits 
with an E-value of < -10 generated from NCBI's nr database. The definition term associated with each sequence was entered 
into both FlyBase and AmiGO where it was given a molecular function designation according to the Gene Ontology 
Consortium. 

b Because a single EST can be associated with several GO terms, the total number of ESTs may be larger than the actual number 
of ESTs analyzed. However, no single EST was catalogued under the same GO term more than once. 



Bristle morphogenesis 

Singletons WHALB002-36 and WHALB004-32 corres- 
ponded to D. melanogaster singed [CG32858-PA, isoform 
A] and darkener of apricot (Doa) [CG33553-PE, isoform 
E]. Although most often associated with neurosensory 
brisde development, these proteins are thought to be crit- 
ical in an array of developmental processes including an- 
tennal morphogenesis, compound eye development, 
salivary gland autophagic cell death, and sex differenti- 
ation (Yun et al. 1994). 

Nervous system development 

Analysis of the primary sequence of WH ALB [0248] ex- 
posed what appears to be a "false contig" (i.e., product of 
two distinct transcripts erroneously conjoined through 
alignment of analogous sequence). The contig was sub- 
sequendy dissolved and each EST compared to the nr 
database separately. Based on results of the query, 
WHALB007-9 was retained along with the BLASTX 
match definition listed in Table 5. Because 



WHALB007-9 and WHALB[0269] were assigned equi- 
valent designations, it was necessary to ascertain whether 
these sequences could be assembled using less stringent 
parameters. However, superposition of the translated se- 
quences to D. melanogaster CG4264-PA, isoform A isoform 
1 revealed an 8 amino acid gap corresponding to Dmel 
TQASIEID 278-285 that failed to link the A. glabripennis se- 
quences. While not contiguous, these assembled se- 
quences represent transcripts that putatively encode heat 
shock protein cognate 4 (Hsc70-4), a protein which func- 
tions in nerve projection events such as axon guidance, 
axonal fasciculation, neurotransmitter secretion and syn- 
aptic vesicle transport. 
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Table 3. Biological process 



Gene Ontology (GO) Term 3 


Number of 
ESTs b 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[p] Cellular process 


[c] Cell adhesion 


3 


0.46% 


1 


1 


[c] Cell communication 


36 


5.53% 


3 


30 


[c] Cell differentiation 


[i] Cell fate commitment 


1 


0.15% 


0 


1 


[i] Neuron differentiation 


6 


0.92% 


2 


2 


[i] Oocyte differentiation 


6 


0.92% 


1 


4 


[i] Photoreceptor cell differentiation 


4 


0.61% 


0 


4 


[i] No further information provided 


1 


0.15% 


0 


1 


[c] Cellular physiological process 


[i] Cell cycle 


7 


1.08% 


0 


7 


[i] Cell death 


4 


0.61% 


0 


4 


[i] Cell division 


2 


0.31% 


0 


2 


[i] Cell homeostasis 


13 


2.00% 


2 


8 


[i] Cell motility 


4 


0.61% 


0 


4 


[i] Cell organization and biogenesis 


21 


3.23% 


4 


13 


[i] Cell proliferation 


4 


0.61% 


0 


4 


[i] Cellular metabolism 


[ii] Alkene metabolism 


1 


0.15% 


0 


1 


[ii] Amine metabolism 


12 


1.84% 


1 


10 


[ii] Cofactor metabolism 


5 


0.77% 


0 


5 


[ii] Generation of precursor 
metabolites and energy 


4 


0.61% 


0 


4 


[ii] Nucleobase, nucleoside, nucleotide 
and nucleic acid metabolism 


22 


3.38% 


2 


18 


[ii] Organic acid metabolism 


1 


0.15% 


0 


1 


[ii] Vitamin metabolism 


1 


0.15% 


0 


1 


[i] Cellularization 


3 


0.46% 


0 


3 


[i] Chromosome segregation 


4 


0.61% 


0 


4 


[i] Transport 


63 


9.68% 


5 


33 


[i] No further information provided 


4 


0.61% 


1 


2 


[p] Development 


[c] Aging 


2 


0.31% 


0 


2 


[c] Appendage development 


3 


0.46% 


0 


3 


[c] Embryonic development 


9 


1.38% 


2 


5 


[c] Morphogenesis 


5 


0.77% 


0 


5 


[c] Organ development 


12 


1.84% 


2 


8 


[c] Pattern specification 


7 


1.08% 


1 


5 


[c] Pigmentation during development 


2 


0.31% 


1 


0 


[c] Post-embryonic development 


4 


0.61% 


1 


2 


[c] Sex differentiation 


1 


0.15% 


0 


1 


[c] System development 


[i] Nervous system development 


13 


2.00% 


3 


7 


[c] Tissue development 


1 1 


1.69% 


1 


9 


[p] Growth 


1 


0.15% 


0 


1 


[p] Physiological process 
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Table 3 (cont.) 



Gene Ontology (GO) Term 3 


Number of 
ESTs" 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[c] Localization 


2 


0.31% 


0 


2 


[c] Metabolism 


[i] Biosynthesis 


[ii] Organismal biosynthesis 


[iii] Cuticle biosynthesis 


4 


0.61% 


0 


4 


[i] Cellular metabolism 


[ii] Aromatic compound metabolism 


2 


0.31% 


1 


0 


[ii] Hormone metabolism 


1 


0.15% 


0 


1 


[ii] One-carbon compound metabolism 


2 


0.31% 


0 


2 


[ii] Oxygen and reactive oxygen species 
metabolism 


2 


0.31% 


0 


2 


[ii] Pheromone metabolism 


1 


0.15% 


0 


1 


[ii] Phosphrous metabolism 


2 


0.31% 


0 


2 


[i] Macromolecule metabolism 


[ii] Carbohydrate metabolism 


10 


1.54% 


0 


10 


[ii] Protein metabolism 


107 


1 6.44% 


16 


54 


[i] Primary metabolism 


[ii] Lipid metabolism 


16 


2.46% 


1 


14 


[i] No further information provided 


5 


0.77% 


1 


3 


[c] Organismal physiological process 


[i] Molting cycle 


1 


0.15% 


0 


1 


[i] Muscle contraction 


4 


0.61% 


1 


2 


[i] Organismal movement 


1 


0.15% 


0 


1 


[c] Regulation of physiological process 


[i] Regulation of cellular physiological 
process 


30 


4.61% 


4 


22 


[p] Regulation of biological process 


[c] Negative regulation of biological process 


1 


0.15% 


0 


1 


[c] Regulation of catalytic activity 


1 


0.15% 


0 


1 


[p] Reproduction 


13 


2.00% 


0 


13 


[p] Response to stimulus 


[c] Behavior 


14 


2.15% 


4 


6 


[c] Response to abiotic stimulus 


8 


1.23% 


1 


6 


[c] Response to biotic stimulus 


14 


2.15% 


2 


10 


[c] Response to external stimulus 


1 


0.15% 


0 


1 


[c] Response to stress 


8 


1.23% 


2 


4 


[c] Sensory perception 


3 


0.46% 


0 


3 


[p] Biological process unknown 


96 


14.75% 


4 


68 



Classification is hierarchial: indented terms are children [c] of parent terms [p] listed above. All functional assignments of 
5th-instar A. glabripennis ESTs described here are the "inferred from electronic annotation" (IEA) using the top 5 BLASTX hits 
with an E-value of < - 1 0 generated from NCBI's nr database. The definition term associated with each sequence was entered into 
both FlyBase and AmiGO where it was given a biological process designation according to the Gene Ontology Consortium. 
''Because a single EST can be associated with several GO terms, the total number of ESTs may be larger than the actual number 
of ESTs analyzed. However, no single EST was catalogued under the same GO term more than once. 
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Table 4. Cellular component 



Gene Ontology (GO) Term 3 


Number 
of ESTs b 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[p] Extracellular matrix 


1 


0.23% 


0 


1 


[p] Extracellular region 


47 


10.73% 


5 


17 


[P] Cell 


[c] Cell part 


[i] Apical part of cell 


1 


0.23% 


0 


1 


[i] Cell projection 


[ii] Flagellum 


2 


0.46% 


0 


2 


[ii] Neuron projection 


3 


0.68% 


0 


3 


[i] Cell soma 


1 


0.23% 


0 


1 


[i] Intracellular 


[ii] Intracellular part 


[iii] Cytoplasm 


30 


6.85% 


4 


22 


[iii] Cytoplasmic part 


[iv] Cell cortex 


1 


0.23% 


0 


1 


[iv] Contractile fiber 


3 


0.68% 


1 


1 


[iv] Cytoplasmic vesicle 


6 


1.37% 


0 


6 


[iv] Cytosol 


[v] Cytosolic part 


[vi] Cytosolic large 
ribosomal subunit (sensu Eukaryota) 


14 


3.20% 


2 


9 


[vi] Cytosolic small 
ribosomal subunit (sensu Eukaryota) 


10 


2.28% 


1 


8 


[v] No further information provided 


10 


2.28% 


3 


4 


[iv] Eukaryotic 43S preinitiation complex 


2 


0.46% 


0 


2 


[iv] Fusome 


1 


0.23% 


0 


1 


[iv] Vacuole 


1 


0.23% 


0 


1 


[iii] Intracellular organelle 


[iv] Intracellular membrane-bound organelle 


[iv] Endoplasmic reticulum 


7 


1.60% 


1 


5 


[iv] Endosome 


4 


0.91% 


0 


4 


[iv] Golgi apparatus 


2 


0.46% 


0 


2 


[iv] Mitochondrion 


15 


3.42% 


2 


1 1 


[iv] Nucleus 


25 


5.71% 


3 


19 


[iv] Intracellular non-membrane-bound organelle 


[iv] Chromosome 


4 


0.91% 


0 


4 


[iv] Cytoskeleton 


3 


0.68% 


0 


3 


[iv] Rhabdomere 


1 


0.23% 


0 


1 


[iv] Ribosome 


9 


2.05% 


1 


7 


[iii] Proteasome complex (sensu Eukaryota) 


4 


0.91% 


0 


4 


[iii] Proton-transporting ATP synthase 
complex 


2 


0.46% 


0 


2 


[iii] Respiratory chain complex 1 (sensu 
Eukaryota) 


2 


0.46% 


0 


2 


[iii] Respiratory chain complex III (sensu 
Eukaryota) 


1 


0.23% 


0 


1 


[ii] No further information provided 


8 


1.83% 


1 


6 


[i] Membrane 
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Table 4 (cont.) 



Gene Ontology (GO) Term 3 


Number 
of ESTs b 


% of total ESTs 
represented 


Number of 
contigs 


Number of 
singlets 


[ii] Coated membrane 


1 


0.23% 


0 


1 


[ii] Membrane part 


[iii] Intrinsic to membrane 


[iv] Integral to membrane 


12 


2.74% 


1 


10 


[ii] Organelle membrane 


[iii] Mitochodrial membrane 


3 


0.68% 


1 


1 


[ii] Plasma membrane 


8 


1.83% 


0 


8 


[ii] No further information provided 


4 


0.91% 


0 


4 


[p] Envelope 


[c] Organelle envelope 


[i] Mitochondrial envelope 


3 


0.68% 


1 


1 


[p] Macromolecular complex 


[i] Protein complex 


[ii] ATP-binding cassette (ABC) transporter 
complex 


1 


0.23% 


0 


1 


[ii] Cyclin-dependent protein kinase 
holoenzyme complex 


1 


0.23% 


0 


1 


[ii] Eukaryotic translation elongation factor 1 
complex 


4 


0.91% 


2 


0 


[ii] Exosome (RNase complex) 


1 


0.23% 


0 


1 


[ii] Ferritin complex 


5 


1.14% 


1 


2 


[ii] Larval serum protein complex 


24 


5.48% 


2 


0 


[ii] Oligosaccharyl transferase complex 


2 


0.46% 


1 


0 


[ii] Protein serine/threonine phosphatase 
complex 


1 


0.23% 


0 


1 


[ii] Ubiquitin ligase complex 


1 


0.23% 


0 


1 


[ii] Unlocalized protein complex 


1 


0.23% 


0 


1 


[p] Cellular component unknown 


146 


33.33% 


9 


108 



Classification is hierarchial: indented terms are children [c] of parent terms [p] listed above. All functional assignments of 
Sth-instar A. glabripennis ESTs described here are the "inferred from electronic annotation" (IEA) using the top 5 BLASTX hits with 
an E-value of < -10 generated from NCBI's nr database. The definition term associated with each sequence was entered into both 
FlyBase and AmiGO where it was given a cellular component designation according to the Gene Ontology Consortium. 
^Because a single EST can be associated with several GO terms, the total number of ESTs may be larger than the actual number of 
ESTs analyzed. However, no single EST was catalogued under the same GO term more than once. 



WHALB[0262] consisted of a single ORF containing the 
entire coding domain of a putative protein paralleling D. 
melanogaster ciboulot (cib). Like DmelXcib, the translated 
sequence of the A. glabripennis coding domain is highly 
congruent, at least on an amino acid level, to P-thymos- 
ins (e.g., Bombyx mori thymosin isoform 2, 5.00E-40; ac- 
cession no. ABF51487). In particular, an actin binding 
motif found in both P-thymosins and cib was identified as 
KLKHTETQEK 7 „ 3 within the WHALB[0262] ORF 
(Nachmias 1993). However, as observed in DmelXcib, 
AglaXcib may possess biochemical properties comparable 
to profilin rather than thymosin with binding to mono- 
melic actin occurring exclusively at the barbed (or plus) 
end of the filament and enhanced actin-based motility 
observed in vitro (Loisel et al. 1999). This regulation of 
actin assembly is thought to be a key factor governing 



axonal outgrowth during the differentiation events that 
underlie brain metamorphosis (Boquet et al. 2000). 

Muscle development 

WHALB[0331] and WHALB003-14 were catalogued 
under the transcript class "muscle development". Al- 
though annotated based on non-traceable author state- 
ments listed in either NCBFs GenBank or FlyBase, these 
assembled sequences clearly possess sequence similarity 
to commonly accepted muscle-associated proteins such as 
muscle protein 20-like protein and muscle LIM protein. 

Cuticle development and puparium 
formation 

Transcripts that potentially code for proteins involved in 
cuticle biosynthesis were also identified within the A. 
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Table 5. Transcripts putatively associated with A. glabripennis development and metamorphosis 



Assembled Sequence Identifier 


BLASTX Match Defintion 


Accession No. 


E-value 


Autophagic cell death 


WHALB004-85 


PREDICTED: similar to CG99I6-PA isoform 1 [Tribolium castaneum] 


XP_966308 


I.00E-8S 


WHALB007-S7 


PREDICTED: similar to CG3 186-PA, isoform A [Tribolium castaneum] 


XP_974942 


I.00E-87 


Bristle morphogenesis 


WHALB002-36 


PREDICTED: similar to CG33SS3-PE, isoform E [Tribolium castaneum] 


XP_970939 


E-108 


WHALB004-32 


PREDICTED: similar to CG328S8-PA, isoform A [Tribolium castaneum] 


XP_972494 


4.00E-89 


Cuticle development 


WHALB[0273] 


PREDICTED: similar to CG8063-PA [Tribolium castaneum] 


XP_969206 


E-139 


WHALB002-I4 


vesicle coat complex COPII GTPase subunit SARI [Aedes aegypti] 


ABF 18297 


7.00E-86 


WHALB007-I2 


PREDICTED: similar to glucose dehydrogenase [Tribolium castaneum] 


XP_968I77 


I.00E-6S 


Imaginal disc morphogenesis 


WHALB002-I3 


PREDICTED: similar to CG623S-PE, isoform E [Tribolium castaneum] 


XP_970874 


4.00E-94 


WHALB005-39 


PREDICTED: similar to CG2723-PA [Tribolium castaneum] 


XP_967I78 


4.00E-42 


WHALB006-40 


effete CG742S-PA [Drosophila melanogaster] 


NP_73I94I 


S.00E-76 


WHALB007-95 


PREDICTED: similar to CG7734-PA, isoform A [Tribolium castaneum] 


XP_97 1 64 1 


4.00E-2I 


Muscle development 


WHALB[033I] 


muscle protein 20-like protein [Anoplophora glabripennis] 


AAY68367 


E-IOI 


WHALB003-I4 


PREDICTED: similar to CG 1 0 1 9-PA, isoform A isoform 1 [Tribolium castaneum] 


XP_96787I 


2.00E-69 


Nervous system development 


WHALB[0248] a 


PREDICTED: similar to CG4264-PA, isoform A isoform 1 [Tribolium castaneum] 


XP_9666I 1 


E-104 


WHALB[0262] 


PREDICTED: similar to CG4944-PB, isoform B isoform 1 [Tribolium castaneum] 


XP_968496 


2.00E-S 1 


WHALB[0269] 


PREDICTED: similar to CG4264-PA, isoform A isoform 1 [Tribolium castaneum] 


XP_9666I 1 


E- 145 


WHALB00I-27 


PREDICTED: similar to CG 1 06S2-PA, isoform A [Tribolium castaneum] 


XP_970626 


6.00E-S9 


WHALB00I-89 


PREDICTED: similar to CG33S9-PB, isoform B [Tribolium castaneum] 


XP_97I500 


2.00E-I7 


WHALB002-4 


PREDICTED: similar to CG 10339-PA [Tribolium castaneum] 


XP_968892 


2.00E-S 1 


WHALB006-83 


shade CG 1 3478-PB, isoform B [Drosophila melanogaster] 


NP_996074 


S.00E-22 


WHALB007-3 1 


PREDICTED: similar to CG 10339-PA [Tribolium castaneum] 


XP_968892 


7.00E-4I 


Photoreceptor morphogenesis 


WHALB002-29 


PREDICTED: similar to CGS77I-PB, isoform B [Tribolium castaneum] 


XP_9732S 1 


4.00E-67 


WHALB003-62 


putative 14-3-3 protein [Maconellicoccus hirsutus] 


ABMS5627 


3.00E-98 


WHALB00S-67 


PREDICTED: similar to CGI070I-PD, isoform D isoform 2 [Tribolium castaneum] 


XP_976I32 


E-109 


Pupation 


WHALB004-95 


PREDICTED: similar to CG8669-PA, isoform A isoform 2 [Tribolium castaneum] 


XP_97S896 


6.00E-2I 



a WHALB[0248] represents a "false contig". Singleton WHALB007-9 was retained under this BLASTX match definition. The accession number remains the 
same: however, the E-value returned was 2.00E- 1 04 upon removal of the second EST. 



glabripennis library. For example, WHALB002-14 showed 
significant sequence similarity to the coat protein com- 
plex (COPII) small G protein Sari. In 2005, Abrams and 
Andrew found that mutations of this gene in Drosophila 
resulted in a range of cuticle defects including reduced 
cuticle length and pigmentation as well as changes in 
ventral dentricle and dorsal hair morphology. 

Two ESTs aligned to form WHALB[0273], a contiguous 
sequence with homology to D. melanogaster yellow-f2 
[CG8063-PA]. This enzyme plays a major role in melan- 
ization reactions that may contribute to sclerotization/ 
tanning of the late stadia or adult insect cuticle (Han et 



al. 2002). In addition, WHALB004-95 and 
WHALB007-12 returned matches to cryptocephal (crc) 
[CG8669-PA, isoform A isoform 2] and glucose dehydro- 
genase (Gld), respectively. Gene expression and deletion 
studies have shown that both Gld and crc act either in re- 
sponse to the late larval ecdysteroid pulse or in the regu- 
lation of ecdysone biosynthesis/ secretion during the on- 
set of pupariation (Andres et al. 1993; Hewes et al. 2000). 

Imaginal disc morphogenesis 

WHALB002-13 closely resembled D. melanogaster twins 
(tws) [CG6235-PA, isoform A] with a 90% identity and 
97% positives. This gene product was originally 
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discovered via a P-element mutation that induced the 
formation of extra anlagen in the posterior compartment 
of the wing disc of Drosophila (Uemura et al. 1993). This 
phenomenon of precursor duplication illustrates the im- 
portance of phosphorylation and dephosphorlyation 
events in the regulation of tissue pattern specification not 
only in relation to imaginal disc morphogenesis, but also 
in regards to several other crucial developmental pro- 
cesses including maturation of the peripheral nervous sys- 
tem and determination of photoreceptor fate in the com- 
pound eye. In 2004, Bajpai et al. further demonstrated 
that DmelXtetf^ ^, which codes for the B/PR55 regulat- 
ory subunit of protein phosphatase 2A (PP2A), functions 
as a positive regulator of Wg/Wnt signaling. This signal 
transduction pathway was also linked to singletons 
WHALB001-37, WHALB004-34, and WHALB00-94, 
although their role(s) in insect development may involve 
alternate biological processes such as fatty-acid/retinoid 
binding and lipid transport. 

Photoreceptor morphogenesis 

WHALB002-29 possessed sequence similarity to D. 
melanogaster Rab 11 [GG5771-PB, isoform B], a small 
GTPase implicated in a variety of trafficking events asso- 
ciated with photoreceptor terminal differentiation includ- 
ing colocalization with rhodopsin at the base of the rhab- 
domere, formation of multivesicular body (MVB) endo- 
somal compartments, and development of specialized 
structures within Garland cells (Satoh et al. 2005). Like- 
wise, WHALB005-67 returned a significant BLAST hit 
to Moesin, an integral component in Drosophila photore- 
ceptor morphogenesis. Although the singleton represen- 
ted only a partial coding domain, query of the translated 
sequence using RPS-BLST revealed a portion of the N- 
terminal FERM domain (FERM_C) confirming its place- 
ment within the Ezrin-Radixin-Moesin (ERM) family of 
proteins. While these proteins are broadly associated with 
actin-based scaffolding, gene disruption studies involving 
RNAi and loss-of-function mutations in Drosophila have 
suggested that Dmel\Moe, in particular, is essential for 
proper assembly of the apical membrane skeleton that 
supports the microvillar array of the rhabdomere 
(Karagiosis and Ready 2003). 

Conclusions 

This study represents the first investigation regarding the 
transcriptome of A. glabripennis. The resultant sequence 
data has been made available to the public and has been 
catalogued according to a controlled vocabulary to facil- 
itate use of the dataset in future studies. Further, several 
transcripts have been identified that are specific to A. 
glabripennis that may be involved in growth and morpho- 
genesis. Collectively, these sequences provide a strong 
foundation for functional genomics studies that will en- 
able the development of more biorational control meas- 
ures to combat this invasive pest. 



Disclaimer 

The use or mention of a trademark or proprietary 
product does not constitute an endorsement, guarantee, 
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